Skip to content

refactor: extract pdf-to-latex-swift + ocr-swift + flip gitignore strategy (#79)#89

Merged
kiki830621 merged 5 commits intomainfrom
refactor/extract-remaining-packages-79
Apr 19, 2026
Merged

refactor: extract pdf-to-latex-swift + ocr-swift + flip gitignore strategy (#79)#89
kiki830621 merged 5 commits intomainfrom
refactor/extract-remaining-packages-79

Conversation

@kiki830621
Copy link
Copy Markdown
Collaborator

Closes #79.

5 commits on this branch

  1. 9170093 Track A — swap pdf-to-latex-swift path dep for PsychQuant/pdf-to-latex-swift v0.1.0
  2. 3a2806a Track B — extract ocr-swift to PsychQuant/ocr-swift v0.1.0
  3. 6fd4880 Track C — delete empty WordToMDTests testTarget
  4. a380cfc Track D fix — flip .gitignore strategy (exclude build products within packages/, not whole packages/)
  5. fe1f2b9 Track E — update CLAUDE.md Sub-Repositories table

Scope expansion during apply

Track D clean-clone verification surfaced a systemic issue the diagnosis didn't anticipate: many packages besides pdf-to-latex-swift/ocr-swift had their Sources missing from git too. Root cause: the old .gitignore used packages/ blanket-exclude + per-file whitelist, but many packages' source files were never whitelisted. Consequence: clean-clone builds have been broken for a long time.

Rather than whitelist each package one by one, commit a380cfc flipped the strategy: source is tracked by default under packages/\*/, only build products and remote-dep cache residue are excluded. This single change tracked 23 previously-missing files across:

  • packages/html-to-word-swift/Sources + Tests
  • packages/md-to-word-swift/Sources + Tests
  • packages/pdf-to-docx-swift/Sources + Tests
  • packages/pdf-to-md-swift/Sources + Tests
  • packages/word-to-html-swift/Sources + Tests
  • packages/bib-apa-swift/Sources + Tests (transitive dep of the 3 bib-apa-to-*)
  • packages/bib-apa-to-html-swift/Sources + Tests
  • packages/bib-apa-to-json-swift/Sources + Tests
  • packages/bib-apa-to-md-swift/Sources + Tests
  • packages/biblatex-apa-swift/LICENSE and others

Verification

New remote repo

…x-swift v0.1.0 (#79 Track A)

Track A of #79 — formalize the existing PsychQuant/pdf-to-latex-swift
remote that was pushed 2026-04-16 but never version-tagged.

Discovery during spectra-discuss: diff -rq confirmed local and remote
are byte-identical, so reconciliation collapses to ceremony:

1. git tag v0.1.0 + push origin v0.1.0 (done on remote)
2. Package.swift: path dep → url dep (from: 0.1.0)
3. swift package update pdf-to-latex-swift
4. Delete local packages/pdf-to-latex-swift/ (no longer needed —
   SPM resolves from URL)

Pre-existing dependency pins preserved per design (Package.resolved
cascade risk): note-core-swift@0.1.3, note-to-pdf-swift@0.1.2,
word-builder-swift@0.9.0, ooxml-swift@0.7.0.

swift build: clean link, no consumer breakage in MacDoc+PDF.swift,
MacDoc+PDF+Phase2.swift, MacDoc+Config.swift, MacDoc+OCR.swift.

Refs #79
Track B of #79 — clean-slate extraction mirroring the note-* pattern.

Changes:
- packages/ocr-swift/ -> PsychQuant/ocr-swift v0.1.0 (new public repo)
  - 5 Swift sources (OCRPipeline, OCRBackend, MLXBackend, OllamaBackend,
    PDFKitExtractor) — byte-identical to previous local version
  - .gitignore excludes .build/, .swiftpm/, Package.resolved
- Package.swift: .package(name: "OCRSwift", path: ...) -> .package(url:)
- Package.swift: product ref .product(name: "OCRCore", package: "OCRSwift")
  -> package: "ocr-swift" (lowercase identity matches remote URL basename)

Critical: PR #84's Qwen3-VL/OCRBackend wiring in PageOCRRunner.swift was
specifically verified to compile against MLXBackend(modelConfig:) /
OllamaBackend(host:) / OCRPipeline(backend:) from the new url dep.
swift build: 6.28s clean link, no consumer breakage.

Pre-existing dependency pins preserved: note-core-swift@0.1.3,
note-to-pdf-swift@0.1.2, word-builder-swift@0.9.0, ooxml-swift@0.7.0,
pdf-to-latex-swift@0.1.0.

Refs #79
Track C of #79 — delete dead scaffolding that declared a testTarget
with .copy("Fixtures") resources but had zero test files and empty
Fixtures/ subdirectory. Matches reality (no tests exist) and follows
#81's "real tests only" philosophy.

If WordToMDSwift coverage is wanted later, a separate issue can
create real smoke tests with proper fixtures — not placeholder
scaffolding that looks like infrastructure but produces no assertions.

swift test: 28 tests still green (26 Swift Testing + 2 XCTest), no
regression.

Refs #79
…79 Track D fix)

Track D clean-clone verification surfaced 3 additional gitignored
packages that #79's original scope didn't anticipate:
- packages/bib-apa-to-html-swift (20 KB source)
- packages/bib-apa-to-json-swift (4 KB source)
- packages/bib-apa-to-md-swift (12 KB source)

All 3 were matched by .gitignore's blanket 'packages/' rule and
referenced by macdoc Package.swift as .package(name:, path:) deps —
same pattern as #78's note-* and #79's pdf-to-latex/ocr-swift.

Decision: Option (b) from issue #79 — whitelist + commit in-tree
(matches srt-to-html-swift, md-to-html-swift, html-to-md-swift,
marker-word-converter-swift, tex-to-docx-swift precedent). These are
small (36 KB total), don't have a compelling independent-repo story,
and the 2 existing private remotes under apa-bib-* naming would
require rename + make-public ceremony for no real benefit.

This completes #79's Phase 5 audit and unblocks clean-clone
'git clone && swift build' verification.

Refs #79
Track E of #79 — add rows for the newly-extracted remote repos:
- pdf-to-latex-swift @ PsychQuant/pdf-to-latex-swift (from Track A)
- ocr-swift @ PsychQuant/ocr-swift (from Track B)

Sub-Repositories table now reflects the remote url-dep status.

Refs #79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: extract pdf-to-latex-swift + ocr-swift to PsychQuant repos; commit Tests/WordToMDTests fixtures (continues #78 pattern)

1 participant